I am using data can be found at https://www.basketball-reference.com/leagues/NBA_2020_totals.html
The data contains individual players statistics in the NBA. It has 30 different attributes to measure overall players statistics for all the regular NBA seasons from the year 2015 to 2020.
I have been watching NBA Basketball games since childhood. A huge fan of Chicago Bulls team, which led to interest of choosing the overall NBA Players Statistics data. My initial intuition is that in past few years NBA players have been scoring more points in 3-pointers shooting categories and are more likely to be in teams that make the NBA playoffs and win the Larry O'Brien Championship Trophy.
The goal of this project is to confirm my hunch is either correct or incorrect regarding shooting 3-pointers is a more optimal solution compared to 2-pointers. I suspect that based on my initial intuition NBA players scoring more in 3-pointers offensive categories are more likely to lead points total and be in teams that make NBA playoffs. To answer my hypothetical question we need to analyze the data from the 2015-2020 NBA seasons, and further research into other attributes of the player's statistics such as Position, Team, Games, Minutes Played, and Field Goal Percentages. As the most current NBA Player Statistics Data does not contain the Playoffs information I will be creating reference data of teams that reached the playoffs each year to merge with the current dataset.
import pandas as pd
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
nba_stats_season1 = pd.read_csv('nba_stats_2015_2016.csv')
# TODO: Use the info() method to determine to inspect the variable (column) names, the number of non-null values,
# and the data types for each variable.
# TODO: Use the head() method to inspect the first five (or more) rows of the data
# TODO: Use the tail() method to inspect the last five (or more) rows of the data
nba_stats_season1.info()
nba_stats_season1.head()
nba_stats_season1.tail()
import pandas as pd
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
nba_stats_season2 = pd.read_csv('nba_stats_2016_2017.csv')
nba_stats_season2.info()
nba_stats_season2.head()
nba_stats_season2.tail()
import pandas as pd
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
nba_stats_season3 = pd.read_csv('nba_stats_2017_2018.csv')
nba_stats_season3.info()
nba_stats_season3.head()
nba_stats_season3.tail()
import pandas as pd
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
nba_stats_season4 = pd.read_csv('nba_stats_2018_2019.csv')
nba_stats_season4.info()
nba_stats_season4.head()
nba_stats_season4.tail()
import pandas as pd
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
nba_stats_season5 = pd.read_csv('nba_stats_2019_2020.csv')
nba_stats_season5.info()
nba_stats_season5.head()
nba_stats_season5.tail()
import pandas as pd
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
nba_playoffs_data = pd.read_csv('nba_playoffs_data.csv')
nba_playoffs_data.info()
nba_playoffs_data.head()
nba_playoffs_data.tail()
I hope to learn and find positive or negative correlation between how NBA players average scoring points in the 3 pointers and 2 pointers shooting category can affect their Teams chances of making the NBA playoffs. Also, explore and identify relationship between NBA player positions that can be used to analyze the data in the shooting category.
I have a hunch this data will reveal that NBA players shooting an average of higher 3 pointers per game have good offensive stats in categories like '3Pointers', 'FieldsPerGame', and 'TotalPointsPerGame' are typically more likely to score more to lead in 'TotalPoints' and be in 'Team' that makes 'Playoff' in a season than players who score an average of 2 pointers per game. Also, based on the NBA player's position for Shooting Guard (SG) and Point Guard (PG) are more likely to score more 3 pointers per games and leads in total points per season compare to position such Center(C), Small-Forwards (SF), and Power-Forwards (PF) are likely to score more 2 pointers per game.
The population being represented is overall NBA Player statistics 2015-2020 also reference data to check which players are in the Teams that made the playoffs during the following seasons.
The sample size is 3196 rows and 37 columns which contains NBA players statistics and playoffs reference data from 2015-2020
The data was collected from the following website https://www.basketball-reference.com/ which is publicly available for variety of NBA stats, and our data focuses on overall NBA Players Statistics from the year 2015-2020 seasons which is downloaded from this site https://www.basketball-reference.com/leagues/NBA_2020_totals.html
This is not random sample and sampling weights are not used with the data
| Variables | Definition | DataType | Will be Used | |
|---|---|---|---|---|
| 1 | Rk | Rank of the players for each season based on overall statistics | Integer | |
| 2 | Player | Player corresponds to name of the NBA players for all each season | String | X |
| 3 | Pos | Pos is the position of the NBA players while playing the game | String | X |
| 4 | Age | Age corresponds to age of the NBA players | String | X |
| 5 | Tm | Tm corresponds to team of the NBA players | String | X |
| 6 | G | G corresponds to the number of games played by NBA player in that season | Integer | X |
| 7 | GS | GS corresponds to the number of games started by NBA players in that season | Integer | |
| 8 | MP | MP corresponds to the number of minutes played by NBA players in that season | Integer | X |
| 9 | FG | The number of field goals that a NBA players have made. This includes both 2 pointers and 3 pointers | Integer | X |
| 10 | FGA | The number of field goals that a NBA players have attempted. This includes both 2 pointers and 3 pointers | Integer | X |
| 11 | FG% | The percentage of field goal attempts that a NBA player makes in that season | Float | X |
| 12 | 3P | The number of 3 pointers field goals that a NBA players have made in that season | Integer | X |
| 13 | 3PA | The number of 3 pointers field goals that a NBA players have attempted in that season | Integer | X |
| 14 | 3P% | The percentage of 3 pointers field goal attempts that a NBA player makes in that season | Float | X |
| 15 | 2P | The number of 2 pointers field goals that a NBA players have made in that season | Integer | X |
| 16 | 2PA | The number of 2 pointers field goals that a NBA players have attempted in that season | Integer | X |
| 17 | 2P% | The percentage of 2 pointers field goal attempts that a NBA player makes in that season | Float | X |
| 18 | eFG% | It is effective field goal percentage that adjusts for the fact that a 3-pointer field goal is worth one more point than a 2-pointer field goal | Float | X |
| 19 | FT | The number of free throws that a NBA players have made in that season | Integer | |
| 20 | FTA | The number of free throws that a NBA players have attempted in that season | Integer | |
| 21 | FT% | The percentage of free throw attempts that a NBA player makes in that season | Float | |
| 22 | ORB | The number of offensive rebounds an NBA player has collected while they were playing on offense in that season | Integer | |
| 23 | DRB | The number of defensive rebounds an NBA player has collected while they were playing on defense in that season | Integer | |
| 24 | TRB | The number of total rebounds an NBA player has collected while they were playing in that season | Integer | |
| 25 | AST | The number of assists is a pass made to another player that lead directly to a basket point | Integer | |
| 26 | STL | The number of times an NBA defensive player takes the ball from a player on offense, while playing game in that season | Integer | |
| 27 | BLK | A block occurs when offensive NBA player attempts a shot, and the defense player tips the ball, blocking their chance to score a point | Integer | |
| 28 | TOV | A turnover occurs when the NBA player on offense loses the ball to the defense data is collected for each NBA player in that season | Integer | |
| 29 | PF | The number of personal fouls an NBA player has committed in that season | Integer | |
| 30 | PTS | The number of points scored by an NBA player in that season | Integer | X |
| 31 | Year | The Year is reference data column to keep track of NBA players statistics from each season | String | X |
| 32 | Playoff | The Playoff is reference data column to keep track of which NBA Players were in the team made playoffs in corresponding seasons | String | X |
| 33 | MinutesPerGame | The MinutesPerGame is calculated reference based on the 'MinutesPlayed' divided by 'GamesPlayed' column because earlier these columns have stats per seasons. To have better analysis and interpretation of the data. | Float | X |
| 34 | FieldGoalsPerGame | The FieldGoalsPerGame is calculated reference based on the 'FieldGoals' divided by 'GamesPlayed' column because earlier these columns have stats per seasons. To have better analysis and interpretation of the data we have created Field Goals per game. | Float | X |
| 35 | 3PointerPerGame | The 3PointerPerGame is calculated reference based on the '3Pointers' divided by 'GamesPlayed' column because earlier these columns have stats per seasons. To have better analysis and interpretation of the data. | Float | X |
| 36 | 2PointerPerGame | The 2PointerPerGame is calculated reference based on the '2Pointers' divided by 'GamesPlayed' column because earlier these columns have stats per seasons. To have better analysis and interpretation of the data. | Float | X |
| 37 | TotalPointsPerGame | The TotalPointsPerGame is calculated reference based on the 'TotalPoints' divided by 'GamesPlayed' column because earlier these columns have stats per seasons. To have better analysis and interpretation of the data. | Float | X |
The first step is adding year column to all the corresponding NBA statistics seasons dataframe. Second step is to concatenate all the NBA seasons dataframe into one large dataset assigning to NBA Players Statistics dataframe. Third step is merging nba_player_stats dataframe we made in second step above with nba_playoffs_data dataframe doing outer join on Year and Team column, so that after merge we can analyze which Teams made the playoffs in what Year, assign the new joined table to overall_nba_playoffs_stats data.
nba_stats_season1['Year'] = "2015-2016" #Adding Year column to this dataframe since data represents 2015-2016 stats
nba_stats_season1.head(5)
nba_stats_season2['Year'] = "2016-2017" #Adding Year column to this dataframe since data represents 2016-2017 stats
nba_stats_season2.head(5)
nba_stats_season3['Year'] = "2017-2018" #Adding Year column to this dataframe since data represents 2017-2018 stats
nba_stats_season3.head(5)
nba_stats_season4['Year'] = "2018-2019" #Adding Year column to this dataframe since data represents 2018-2019 stats
nba_stats_season4.head(5)
nba_stats_season5['Year'] = "2019-2020" #Adding Year column to this dataframe since data represents 2019-2020 stats
nba_stats_season5.head(5)
We are going concatenate all the NBA seaons dataframe where we added Year column into one large dataset assigning to NBA Players Statistics dataframe represents the following variable nba_players_stats mentioned below.
nba_players_stats = pd.concat([nba_stats_season1, nba_stats_season2, nba_stats_season3, nba_stats_season4, nba_stats_season5])
nba_players_stats.shape
nba_players_stats.info()
nba_players_stats.head()
Step 1: We are merging nba_player_stats dataframe which is the concatenated dataframe with nba_playoffs_data reference dataframe which includes Year, Team, and Playoffs column shows 'Y' because all the Teams in the playoffs reference made the playoffs
Step 2: We did an outer join on Year and Team column, so that returns all the rows from the left dataframe, all the rows from the right dataframe, and matches up based on the Year, Team and Playoffs that represents 'Y' who made the playoffs. Also, with NaNs elsewhere for the Teams that did not make the playoffs
overall_nba_playoffs_stats = pd.merge(nba_players_stats, nba_playoffs_data, how="outer", on=["Year", "Tm"])
overall_nba_playoffs_stats.shape
overall_nba_playoffs_stats.info()
overall_nba_playoffs_stats
We are checking for the rows in the dataframe below to see which Teams did not make the playoffs, these are NaNs values after the outer join on the dataframe we are making these corresponding NaNs values to Playoff = 'N' which means that these NBA players and Teams did not make the playoffs in that Year based on the historical five year data
overall_nba_playoffs_stats.loc[overall_nba_playoffs_stats.Playoff != "Y", "Playoff"] = "N"
overall_nba_playoffs_stats
The query below shows 1491 NBA players are in the Teams made Playoffs and 1705 NBA players are in the Teams that did not make Playoffs for NBA seasons from year 2015-2020
overall_nba_playoffs_stats.Playoff.str.contains("Y").sum()
overall_nba_playoffs_stats.Playoff.str.contains("N").sum()
overall_nba_playoffs_stats.shape
Variables created that includes all columns that needed to be dropped because they are not relevant to the data analysis
col_to_drop = ['Rk', 'GS', 'FT', 'FTA', 'FT%', 'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF']
In the new dataframe named overall_nba_playoffs_stats2 that uses the col_to_drop variable to eliminate all irrelevant 13 columns, and they will be dropped from the table
overall_nba_playoffs_stats2 = overall_nba_playoffs_stats.drop(columns=col_to_drop, inplace=False)
overall_nba_playoffs_stats2
Next step, to show shape of the table only 19 remaining from the 32 columns for further analysis
overall_nba_playoffs_stats2.shape
Rename columns for better understanding and easier interpretation on what each column means. The variable overall_nba_playoffs_stats2 columns used to rename the main dataframe for the remaining 19 columns. Also successfully confirmed that columns are renamed in the main dataframe
overall_nba_playoffs_stats2_cols = ['Player', 'Position','Age', 'Team','GamesPlayed', 'MinutesPlayed','FieldGoals', 'FieldGoalsAttempts','FieldGoals%', '3Pointers', '3Pointer_Attempts', '3Pointers%', '2Pointers', '2Pointer_Attempts', '2Pointers%', 'EffectiveFieldGoals%', 'TotalPoints', 'Year','Playoff']
overall_nba_playoffs_stats2.columns = overall_nba_playoffs_stats2_cols
overall_nba_playoffs_stats2.columns
overall_nba_playoffs_stats2
I have created 4 extra reference data column with MinutesPerGame, FieldGoalsPerGame, 3PointerPerGame, and 2PointerPerGame to have better understanding and interpretation of the data. Also, would get easier for the viewer to observe Exploratory Data Analysis done on the following columns.
Next step, to show shape of the table only 24 remaining columns for further analysis
overall_nba_playoffs_stats2['MinutesPerGame'] = round(overall_nba_playoffs_stats2['MinutesPlayed'] / overall_nba_playoffs_stats2['GamesPlayed'],2)
overall_nba_playoffs_stats2['FieldGoalsPerGame'] = round(overall_nba_playoffs_stats2['FieldGoals'] / overall_nba_playoffs_stats2['GamesPlayed'],0)
overall_nba_playoffs_stats2['3PointerPerGame'] = round(overall_nba_playoffs_stats2['3Pointers'] / overall_nba_playoffs_stats2['GamesPlayed'], 0)
overall_nba_playoffs_stats2['2PointerPerGame'] = round(overall_nba_playoffs_stats2['2Pointers'] / overall_nba_playoffs_stats2['GamesPlayed'], 0)
overall_nba_playoffs_stats2['TotalPointsPerGame'] = round(overall_nba_playoffs_stats2['TotalPoints'] / overall_nba_playoffs_stats2['GamesPlayed'], 0)
overall_nba_playoffs_stats2
We have removed all irrelevant columns, and renamed all the remaining columns for easier interpretation. Now we need to identify all missing values in each row from the dataframe so the data can be properly analyzed. To do this step will be using 'overall_nba_playoffs_stats2.isnull().sum()' which used to check for all missing values in each column.
overall_nba_playoffs_stats2.shape
overall_nba_playoffs_stats2.isnull().sum() #shows all missing values in each column
For the data clean up of missing values we need to take step-by-step approach rather than using 'dropna'function on entire dataframe because based on NBA players position some players may not shoot 3Pointers vice-versa other players may not shoot 2Pointers which affects percentage for following columns above using missing values. We will create variables for all rows in each column that contain missing values, and update dataframe
All rows with missing values in both 'FieldGoals%' and 'EffectiveFieldGoals%' series removed from overall_nba_playoffs_stats2. The missing values showned for the following means that these haven't scored any points in that season. The amount of Games played were really low could be because of injury.
missing_fieldgoals_percentage = overall_nba_playoffs_stats2[overall_nba_playoffs_stats2['FieldGoals%'].isnull()]
missing_fieldgoals_percentage
overall_nba_playoffs_stats2 = overall_nba_playoffs_stats2.drop(missing_fieldgoals_percentage.index)
overall_nba_playoffs_stats2.isnull().sum() #shows all missing values in each column
All rows with missing values in '3Pointers%'and '2Pointers%' series are checked from overall_nba_playoffs_stats2 dataframe. As mentioned earlier that based on NBA players position some players may not shoot 3Pointers vice-versa other players may not shoot 2Pointers which affects percentage for following columns above using missing values.These rows with missing values are not dropped. We will replace all missing value 0 with 'overall_play_nba_playoffs_stats2.fillna(0, inplace=True)' functionality.
overall_nba_playoffs_stats2.shape #used to confirm rows dropped
overall_nba_playoffs_stats2.isnull().sum() #used to reveal remaining rows with missing values
overall_nba_playoffs_stats2.fillna(0, inplace=True)
overall_nba_playoffs_stats2.isnull().sum() #used to reveal remaining rows with missing values
overall_nba_playoffs_stats2.shape
overall_nba_playoffs_stats2.dtypes
A boxplot is created for 'GamesPlayed' and 'Age' to reveal any significant outliers.
overall_nba_playoffs_stats2.boxplot(column=['GamesPlayed', 'Age'])
overall_nba_playoffs_stats2[['GamesPlayed', 'Age']].describe(percentiles = [.25, .5, .75, .95])
Explaining Box Plot in GamesPlayed Column: The GamesPlayed column box plot displays the summary of five sets such as the lower whiskers represents minimum games played by NBA player is 1. Similarly, upper whiskers is the most games played by an NBA player is 82 per season. The lower quartile shows 25% of NBA player that played below 20 games. The upper quartile represents 75% of NBA player that played below 68 games. The inter-quartile represents average NBA player games played is 48. There are no significant outliers in the Gamesplayed box plot.
Explaining Box Plot in Age Column: The Age column box plot displays the summary of five sets such as the lower whisker is the youngest NBA player of age 19 in our data. Similarly, upper whiskers is the oldest NBA player's age of 39 in our data. The lower quartile shows 25% of NBA player's age are below the age of 23 in the data. The upper quartile shows 75% of NBA player's age are below the age of 29. The inter-quartile represents average NBA player's age is 26. The outliers in the Age column box plot are plotted as individual dots that are in-line with whiskers for instance we can see that upper extreme for the NBA player's age is 39 beyond that age are outliers which has max age of 43 which will be removed from our data.
shape of dataframe shows 3179 rows before dropping NBA players that have age greater that 39
print("Removing Outliers from Age Column")
overall_nba_playoffs_stats2.shape
ageover39 = overall_nba_playoffs_stats2[overall_nba_playoffs_stats2.Age > 39].index
overall_nba_playoffs_stats2.drop(ageover39, inplace=True) #function that drops all players with age above 39
overall_nba_playoffs_stats2.shape
shape of dataframe shows 3172 rows after dropping NBA players that have age greater that 39
overall_nba_playoffs_stats2['MinutesPerGame'].describe(percentiles = [.25, .5, .75, .95])
overall_nba_playoffs_stats2.boxplot(column=['MinutesPerGame'])
Explaining Box Plot in MinutesPerGame Column: The MinutesPerGame column box plot displays the summary of five sets such as the lower whiskers represents minimum minutes per game played by an NBA player is 0.67 minutes. Similarly, upper whiskers is the maximum minutes per game played by an NBA player is 42 minutes. The lower quartile shows 25% of NBA player that played below 12 minutes 18 seconds per game. The upper quartile represents 75% of NBA player that played below 26 minutes 70 seconds per game. The inter-quartile represents average NBA player minutes per game is 19 minutes 28 seconds. There are no significant outliers in the Gamesplayed box plot.
I was just curious to see which NBA player played lowest minutes per games in the data. I have queried that data below because it did not seem realistics to have such a lowest minutes per game stats in an NBA season
overall_nba_playoffs_stats2.shape
minutespergame = overall_nba_playoffs_stats2[overall_nba_playoffs_stats2.MinutesPerGame <= 0.67]
minutespergame
overall_nba_playoffs_stats2['FieldGoalsPerGame'].describe(percentiles = [.25, .5, .75, .95])
overall_nba_playoffs_stats2.boxplot(column=['FieldGoalsPerGame'])
Explain Box Plot in FieldGoalsPerGame Column: The FieldGoalsPerGame column box plot displays the summary of five sets such as the lower whiskers represents minimum fields goals per game by an NBA player is 0 because these include both three and two pointer shots category statistics in this column based on position of an NBA player they shoot only 2 or 3 pointers. Similarly, upper whiskers is the maximum field goals per game by an NBA player is 8. The lower quartile shows 25% of NBA player's field goals per games are below 1. The upper quartile represents 75% of NBA player's field goals per game are below 4. The inter-quartile corresponds to the average NBA Player Fields Goals Per Game is 3. Even though there are outliers according FieldGoalsPerGame box plot I will consider keeping them because as mentioned above FieldGoals plays a big factor based NBA player's position also doing data analysis on my hypothesis
shape of dataframe is still 3172 rows after analyzing the FieldsGoalsPerGame box plot
overall_nba_playoffs_stats2.shape
overall_nba_playoffs_stats2.boxplot(column=['3PointerPerGame', '2PointerPerGame', 'TotalPointsPerGame'])
overall_nba_playoffs_stats2[['3PointerPerGame', '2PointerPerGame', 'TotalPointsPerGame']].describe(percentiles = [.25, .5, .75, .95])
Explain Box Plot in 3PointerPerGame Column: The 3PointerPerGame column box plot displays the summary of five sets such as the lower whiskers represents minimum 3 pointers per game by an NBA player is 0 because position of an NBA player they shoot only 2 or 3 pointers. Similarly, upper whiskers is the maximum of 5 three pointers per game scored by an NBA player. The lower quartile shows 25% of NBA player's have scored below 0 three pointers per game for instance NBA position such as 'Center' or 'PowerForward mostly do not shoot 3 pointers in a game. The upper quartile represents 75% of NBA player's have scored below 1 three pointers per game. The inter-quartile corresponds to the average NBA Player score 1 three pointer per game in a regular season.
Explain Box Plot in 2PointerPerGame Column: The 2PointerPerGame column box plot displays the summary of five sets such as the lower whiskers represents minimum 2 pointers per game by an NBA player is 0 because position of an NBA player they shoot only 2 or 3 pointers. Similarly, upper whiskers is the maximum of 10 two pointers per game scored by an NBA player. The lower quartile shows 25% of NBA player's have scored below 1 two pointers per game. The upper quartile represents 75% of NBA player's have scored below 3 two pointers per game. The inter-quartile corresponds to the average NBA Player score 2 two pointers per game in a regular season.
Explain Box Plot in TotalPointsPerGame Column: The TotalPointsPerGame column box plot displays the summary of five sets such as the lower whiskers represents minimum total points per game by an NBA player is 0 because position of an NBA player they shoot only 2 or 3 pointers also there is a chance player could be injured. Similarly, upper whiskers is the maximum of 36 total points per game scored by an NBA player. The lower quartile shows 25% of NBA player's have scored below 4 total points per game. The upper quartile represents 75% of NBA player's have scored below 11 total points per game. The inter-quartile corresponds to the average NBA Player score 7 total points per games in a regular season.
To keep in mind that the following is historical data from the year 2015-2020 NBA seasons where the data can fluctuate in these columns mentioned above. Also, even though there are outliers in the columns above we will keep them because these columns above play a vital role in data analysis that shooting 3 pointers is an optimal solution over 2 pointers to make the NBA playoffs.
We are trying to reorder the columns in the table so that it looks organized for better understanding as the reviewer.
Below, I am showing the before reordering process of the columns where all columns that we are newly added are at the end of the table
overall_nba_playoffs_stats2.columns
In this step, we are reordering the newly added columns properly in the table which were added at the end of the table earlier.
overall_nba_playoffs_stats2 = overall_nba_playoffs_stats2[['Player', 'Position', 'Age', 'Team', 'GamesPlayed', 'MinutesPlayed', 'MinutesPerGame', 'FieldGoals',
'FieldGoalsPerGame', 'FieldGoalsAttempts', 'FieldGoals%', '3Pointers', '3PointerPerGame', '3Pointer_Attempts',
'3Pointers%','2Pointers', '2PointerPerGame', '2Pointer_Attempts','2Pointers%', 'EffectiveFieldGoals%',
'TotalPoints', 'TotalPointsPerGame', 'Year','Playoff']]
overall_nba_playoffs_stats2.columns
In the final step, making sure that shape of the dataframe as the correct amount of rows and columns as expected from the cleanup process. Furthermore made a last check for to make sure there are no nulls values in the columns. There were 3196 rows and 32 columns in the beginning after joining with reference data, then eliminated all 13 irrelevant columns, and added 5 calculated reference columns so that better analysis as well as interpretation of the data. A total of 24 rows were eliminated. After, doing the the cleanup process for phase one 3172 rows and 24 columns will be used for data visualization in the next phase.
overall_nba_playoffs_stats2.shape
overall_nba_playoffs_stats2.isnull().sum()
overall_nba_playoffs_stats2.info()
overall_nba_playoffs_stats2.to_csv('nba_stats_final_phase1.csv', header = True, mode = 'w', index=False)
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mpl
import plotly.express as px
%matplotlib inline
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
df = pd.read_csv('nba_stats_final_phase1.csv')
df.info()
df.head()
columns = ['TotalPointsPerGame','FieldGoalsPerGame', '3PointerPerGame', '2PointerPerGame', 'MinutesPerGame']
df_corr = df[columns]
# setting up the heatmap
corrmat = df_corr.corr()
# set the figure size
f, ax = plt.subplots(figsize=(9, 6))
# TODO create a heat map using all six numeric variables. Pick a new color combination.
# https://matplotlib.org/3.1.1/gallery/color/colormap_reference.html
sns.heatmap(corrmat, vmax=.8, square=True, annot=True, cmap='RdYlGn', linewidths=.5 )
plt.title('Heatmap NBA Statistics Analysis')
#TODO explain how the visual cues of the heatmap represent the correlactions.
plt.savefig('Correlation Heat Map Beer Reviews')
Explaination of how the visual cues of the heatmap represent the correlations.
TotalPointsPerGame Correlations Analysis:
Based on correlation matrix the dark green color shows high positive correlation between Total points per game and three other columns such as FieldGoals, 2 pointer, and minutes played by an NBA player per game. This relationship definitely makes sense because when we observe an NBA overall Total points per game it depends upon these columns in terms the amount of minutes played by a player, fields goals attempts in 2 pointer or 3 pointer shooting scores category by the player based on that the Most Valuable Player of the Game is awarded. Similarly, light green color show slight positive correlation as the amount of 3 pointers scored by an NBA player might be less based on player's position.
FieldGoalsPerGame Correlations Analysis:
In the Field goals per game correlation matrix as mentioned dark green shows high positive correlation between Field Goals per game and other three columns such as Total points, 2 pointer, and Minutes per game. Similar to analysis of total points the field goals also depends how many 2 pointers player attempted and scored points which gets tallies to the total points, and amount of minutes played is also important for this analysis. The yellow color shows moderate neutral correlation between fields goals and 3 pointers per games because as field goals included both 2 and 3 pointers shooting category there are two possibilities either player might have attempted more 2 pointers per games or player missed a lot of 3 pointers per game during the season.
3PointerPerGame Correlations with TotalPointsPerGame, FieldGoalsPerGame and MinutesPerGame Analysis:
The three pointer per game shows high positive relationship in dark green color with total points and minutes per game because its dependent on how minutes player as played also three pointer shooting score gets tallied to overall total points per game. On the other hand, three pointer per game shows moderate neutral correlation with field goals per game because players may have not been scoring based on the amount of fields goals they have attempted per game.
2PointerPerGame Correlations with TotalPointsPerGame, FieldGoalPerGame and MinutesPerGame Analysis:
The two pointer per game shows high positive relationship in dark green color with total points and fields goals per game because players might have been scoring around range or amount of fields goals they attempted per game. Also, two pointer shooting score gets tallied to overall total points per game. The minutes per game shows medium positive relationship because its dependent on how many minutes player have played which could affect their shooting category either in a positive or negative way.
3PointerPerGame Correlations with 2PointerPerGame Analysis:
The three pointer and two pointer per games shows strong negative correlation in red color as they both are two different shooting category and not dependent on each other in terms of contribution towards player's overall statistics.
f, axes = plt.subplots(1,4)
plt.figsize=(10,20)
sns.boxplot(df['GamesPlayed'], orient = 'v', color = 'forestgreen', ax = axes[0])
sns.boxplot(df['Age'], orient = 'v', color = 'darkorange', ax = axes[1])
sns.boxplot(df['MinutesPerGame'], orient = 'v', color = 'lightgray', ax = axes[2])
sns.boxplot(df['FieldGoalsPerGame'], orient = 'v', color = 'dimgrey', ax = axes[3])
plt.tight_layout()
f, axes = plt.subplots(1,3)
plt.figsize=(10,20)
sns.boxplot(df['2PointerPerGame'], orient = 'v', color = 'darkviolet', ax = axes[0])
sns.boxplot(df['3PointerPerGame'], orient = 'v', color = 'darksalmon', ax = axes[1])
sns.boxplot(df['TotalPointsPerGame'], orient = 'v', color = 'lavender', ax = axes[2])
plt.tight_layout()
Explain Outliers: GamesPlayed, Age, MinutesPerGame and FieldGoalsPerGame
There are no significant outliers in the Gamesplayed box plot. The outliers in the Age column box plot are accurate we can see that upper extreme for the NBA player's age is 38 but beyond that are outliers based on data there are several players who are still in NBA around that age range of 40 and are in the team that make playoffs.There are no significant outliers in the MinutesPerGame box plot. The FieldsPerGame box plot shows accurate outliers because every season produces a group of players who achieve superior offensive statistics scoring around range of 9 to 11 fieldsgoals per game. Also, FieldsPerGame include both three and two pointer shots category statistics in this column based on position of an NBA player they shoot only 2 or 3 pointers.
Explain Outliers: 2PointerPerGame, 3PointerPerGame, and TotalPointsPerGame
The outliers for all the following columns are accurate as represented by box plot. The outliers for 2PointerPerGame lies beyond the upper extreme is around 7 to 10 two pointer per game. Similarly, for 3PointerPerGame box plot indicates NBA players shooting around the range of 3 to 5 three pointers per game are outliers. Finally, TotalPointsPerGame box plot looks kind of accurate based on our data because some players might be injured or have played less minutes per games compare to other which skews the box plot upper extreme of NBA players scoring greater than 22 points per games are outliers.
But the overwhelming majority of players who typically achieve subpar offensive statistics in following shooting category play a vital roles in our analysis because these outliers are NBA players leading in average two pointer, three pointer and total points per game category doing analysis on these outliers would help us in understanding and answering our hypothesis
plt.figure(figsize=(10, 6))
sns.swarmplot(x=df['3PointerPerGame'], y=df['TotalPointsPerGame'], palette='Spectral', hue=df['Position'])
plt.title("TotalPointsPerGame vs 3PointerPerGame", size=15)
plt.legend(bbox_to_anchor=(1.0, 1), loc=2, borderaxespad=1)
NBA Player Position
1. PF - Power Forwards
2. SG - Shooting Guards
3. SF - Small Forwards
3. C - Center
4. PG - Point Guards
Explaination of Swarm Plot
The swarm plot is more visible than a scatter plot and are effectively categorized like a bar plot. As we are comparing total points per game scored by NBA player compared to three pointer per game with help of swarm we are able to categorize how different NBA player's position have scored more three pointer per game in our historical data. Based on the swarm plot the red dots in the data corresponds NBA Shooting Guard (SG) position it shows that players in this position have an high average scoring range around 2 to 4 three pointer per game which leads them to having an average of 10 to 25 total points per games compare to other positions which have low scoring range from 0 to 1 three pointer per game
The light orange dots in the data corresponds to NBA Point Guard (PG) position which indicates that players in this position have an high average scoring range around 3 to 5 three pointer per game which leads them to having an average of 10 to 35 total points per games shooting three pointer compare to other positions.
It is interesting to note that players who play multiple positions (e.g., PF-C, SF-SG, PG-SG, PF-SF) do not make significant amount of total points per game. This shows that the players who are assigned multiple positions may have other unique responsibilities compared to traditionally NBA positions mentioned above.
plt.figure(figsize=(10, 6))
sns.swarmplot(x=df['2PointerPerGame'], y=df['TotalPointsPerGame'], palette='terrain', hue=df['Position'])
plt.title("TotalPointsPerGame vs 2PointerPerGame", size=15)
plt.legend(bbox_to_anchor=(1.0, 1), loc=2, borderaxespad=1)
NBA Player Position
1. PF - Power Forwards
2. SG - Shooting Guards
3. SF - Small Forwards
3. C - Center
4. PG - Point Guards
Explaination of Swarm Plot
Based on the swarm plot the green and light green dots in the data corresponds to NBA Center (C) and Point Guard (PG) position which indicates that players in this position have an high average scoring range around 2 to 7 two pointer per game which leads them to having an average of 5 to 30 total points per games shooting two pointer compare to other positions.
The blue dots in the data corresponds NBA Power Forward (PF) position it shows that players in this position have an high average scoring range around 1 to 6 two pointer per game which leads them to having an average of 5 to 20 total points per games shooting two pointer compare to other positions. Also, it interesting to note that players who play multiple positions (e.g., PF-C, SF-SG, PG-SG, PF-SF) do not make significant amount of total points per game because they might be only be playing multiple positions sometimes during a season, not regularly compare to their normal position.
import plotly.graph_objs as go
fig = px.pie(df, values='3Pointers', names='Position',
title='NBA Players Total 3 Pointers Statistics Season 2015-2020',
hover_data=['Position'], labels={'3Pointers'})
fig.update_traces(textposition='inside', textinfo='percent+label')
Hover over Data on NBA Player Position
1. PF - Power Forwards
2. SG - Shooting Guards
3. SF - Small Forwards
3. C - Center
4. PG - Point Guards
Explaination of Pie Chart: NBA Player 3 Pointers Statistics Per Season from 2015-2020 Categorized by Positions
To further confirm my hunch either correct or incorrect that the overall data will reveal positive correlation between NBA players in Point Guard (PG) and Shooting Guard (SG) position are likely to score an average of higher three pointers compared to other position. We decided to look at broader data with help of pie chart which represents the overall NBA players 3 pointers statistics per season from year 2015 to 2020. It is sliced into percentages of different NBA positions based on their shootings score per season.
Shooting Guard (SG) and Point Guard (PG) Analysis:
As we can see that 31.6% of the 3 pointers are scored by players in the Shooting Guard (SG) position with the total of 43,471 three pointers from our historical data. Similarly, 22.5% of 3 pointers are scored by NBA players in Point Guard (PG) position with total of 30,924 three pointers for five regular season. It clearly shows that Shooting Guard position have always scored more three pointers per season then any other position in NBA
Small Forwards (SF) and Power Forwards (PF) Analysis:
On the other hand, NBA players playing in Small Forward (SF) have scored 27,797 total three pointers per season,as they cover 20.2% of the data. The pie chart also indicates that 18.3% of 3 pointers per season are scored by players in the Power Forwards position. It is surprising to see there are is not much difference in between Small Forwards, Power Forwards, and Point Guards positions shooting three pointer category. It leads to the fact that NBA player's position doesn't really dependent upon their scoring points style. As shown in the pie chart with three pointer per season data in this case players might trying to learn how to improve in both category over the years which led to us seeing closer percentages.
Center (C) and Multiple Position Analysis:
Center (C) and other multiple positions have combined percentage of only 7.4% in the three pointer per season shooting category for five year historical data which shows they are outliers based on our assumptions they are more likely to attempt more two pointers than three pointers per season. Also, players with multiple positions do not make significant amount of three pointers per season because they might be only be playing multiple positions only sometimes during a season, not regularly compare to their normal position.
import plotly.express as px
fig = px.pie(df, values='2Pointers', names='Position',
title='NBA Players Total 2 Pointers Statistics Season 2015-2020',
hover_data=['Position'], labels={'2Pointers'})
fig.update_traces(textposition='inside', textinfo='percent+label')
Hover over Data on NBA Player Position
1. PF - Power Forwards
2. SG - Shooting Guards
3. SF - Small Forwards
3. C - Center
4. PG - Point Guards
Explaination of Pie Chart: NBA Player 2 Pointers Statistics Per Season from 2015-2020 Categorized by Positions
To further confirm my hunch either correct or incorrect that the overall data will reveal positive correlation between NBA players in Power Forward (PF), Small Forward (SF), and Center (C) position are likely to score an average of higher two pointers compared to other position. We decided to look at broader data with help of pie chart which represents the overall NBA players 2 pointers statistics per season from year 2015 to 2020. It is sliced into percentages of different NBA positions based on their shootings score per season.
Center (C) and Point Guard (PG) Analysis:
As we can see that 24.2% of the 2 pointers are scored by players in the Center (C) position with the total of 92,408 two pointers from our historical data. Similarly, 20.3% of 2 pointers are scored by NBA players in Point Guard (PG) position with total of 77,449 two pointers for five regular season. It clearly shows that Center position have always scored more two pointers per season then any other position in NBA
Power Forwards (SF) and Shooting Guards (SG) Analysis:
On the other hand, NBA players playing in Power Forward (PF) have scored 74,804 total two pointers per season,as they cover 19.6% of the data. The pie chart also indicates that 19.6% of 2 pointers per season are scored by players in the Shooting Guard (SG) position. It is surprising to see there is tie between Power Forwards and Shooting Guard position in the shooting two pointer category. It leads to the fact that NBA player's position doesn't really dependent upon their scoring points style. As shown in the pie chart with two pointer per season data in this case players might be trying to learn how to improve in both category over the years which led to us seeing tie in terms of percentages between these two NBA positions.
Small Forwards (SF) and Multiple Position Analysis:
Small Forwards (SF) and other multiple positions have combined percentage of only 16.3% in the two pointer per season shooting category for five year historical data which shows they are outliers based on our hunch was incorrect we thought they are more likely to attempt more two pointers than three pointers per season which shows that there is neutral correlation between NBA player's position and the offensive shooting category. The multiple positions only contributed 0.7% of data which indicates that they do not make significant amount of two pointers per season because they might be only be playing multiple positions only sometimes during a season, not regularly compare to their normal position.
import plotly.express as px
fig = px.line_polar(df, r='3PointerPerGame', theta='Year',color='Player', hover_name='Playoff', line_close=True, width=800, height=500)
fig.show()
Hover over Year Angle for Leading Scorers in 3 Pointer Per Game Category
1. 2015-2016 shows green line
2. 2016-2017 shows green line
3. 2017-2018 shows green and pink line
3. 2018-2019 shows green and pink line
5. 2019-2020 represents orange line
Polar Line Plot Analysis:
The polar line plot will helps us to understand which NBA players is leading in the three pointer per game category from the year 2015 to 2020 based on the player's data line which shows the amount of three pointer they have scored thats touches particular year angle. Also, retrieve information if the NBA player leading in points made to the playoffs or not.
Let's starts with the regular season year 2015-2016 and 2016-2017 as we hover over the green line on these year column angle we can see that Steph Curry is leading in three pointer category for both these years, and have made the playoffs.
Similarly, there is tie between James Harden indicated in the pink line , and Steph Curry data is in green line as they both touch 2017-2018 and 2018-2019 year angle with leading score of 4 three pointer and 5 three pointer per games for those years.
Finally, Damian Lillard indicated with orange line being a leading scorer with 4 three pointer per game for the year 2019-2020. The data reveals that all the 3 NBA players who are the leading points scorers in the three pointer category have made playoffs. To further confirm our assumptions above will have to look at 2 pointer per game category as well, and see this pattern persist.
import plotly.express as px
fig = px.line_polar(df, r='2PointerPerGame', theta='Year',color='Player', hover_name='Playoff', line_close=True, width=800, height=500)
fig.show()
Hover over Year Angle for Leading Scorers in 2 Pointer Per Game Category
1. 2015-2016 shows red and orange line
2. 2016-2017 shows red line
2. 2017-2018 shows red line
3. 2018-2019 shows red and blue line
5. 2019-2020 represents pink line
Polar Line Plot Analysis:
The polar line plot will helps us to understand which NBA players is leading in the two pointer per game category from the year 2015 to 2020 based on the player's data line which shows the amount of two pointer they have scored thats touches particular year angle. Also, retrieve information if the NBA player leading in points made to the playoffs or not.
Let's starts with the regular season year 2015-2016,2016-2017, 2017-2018, and 2018-2019 as we hover over the red line on these year column angle we can see that Anthony Davisis leading in two pointer category for these four years, but have only made it to the playoffs for once out of 4 years.
In addition, there is a tie between LaMarcus Aldridge indicated in the orange line, and Anthony Davis data is in red line as they both touch 2015-2016 year angle with leading score of 9 two pointer per games for this year but only LaMarcus Aldridge made the playoffs that year, and not Anthony Davis.
Similarly, there is a tie between Giannis Antetokounmpo indicated in the blue line, and Anthony Davis data is in red line as they both touch 2018-2019 year angle with leading score of 9 two pointer per games for this year but only Giannis Antetokounmpo made the playoffs that year, and not Anthony Davis.
Finally, Russell Westbrook indicated with pink line being a leading scorer with 10 two pointer per game for the year 2019-2020. The data reveals that that only 2 out of 3 NBA players who are the leading points scorers in the two pointer category have made playoffs. So, our assumption was wrong the pattern from previous polar plot did not persist, regarding NBA players leading in the shooting category always make the playoffs.
import plotly.express as px
fig = px.scatter(
df, x='MinutesPerGame', y='GamesPlayed', size='TotalPoints', size_max=13, color_continuous_scale='rdylbu_r',
color='Playoff', hover_name='Player', trendline="ols", title='GamesPlayed vs MinutesPerGame')
fig.update_traces(marker=dict(line=dict(width=1, color='DarkSlateGrey')),
selector=dict(mode='markers'))
Hover Over Data For Addition Information
1. Player Name
2. Total Points Scored from 2015-2020
4. Games Played
5. Minutes Per Game
6. N - Did not make Playoffs (Blue Dots)
7. Y - Made it to the Playoffs (Red Dots)
Scatter Plot Analysis:
The main goal of this scatter plot is to analyze if the data indicates any correlation between GamesPlayed and MinutesPerGame. Also, how does it affect players chances of making playoffs or not.
As we look at the trendline in the scatter plot we see that a lot of red dots in the right upper quadrant of the plot where NBA players who have played around range of 75 - 80 games and have an average of greater than 30 minutes per game have more chances of making playoffs, and scoring more points based on the historical data.
On the other hand, trendline for the blue dots starts at the lower left quadrant to mid upper right quadrant of the plot where NBA who have played around range of 20 - 65 games and have an average of less than 30 minutes per game have less chances of making playoffs, and scoring points based on the historical data.
But there are some exceptions in the scatter plot for instance there are blue dots in the right upper quadrant of the plot where some NBA players have either played all 82 games or above 75 games, and have an average playing time of more than 30 minutes per games but still not make playoffs. So based on our analysis we can conclude that there is a neutral correlation between GamesPlayed and MinutesPerGame with a dependencies on overall Team statistics with would affect players chances of making playoffs.
Since the 3PointerPerGame shooting category was highly correlated with overall players statistics we will use 3PointerPerGame score for our analysis to find top 20 players who have a high average in the three pointer shooting category. Based on the following heat map data analysis it would help us reveal that either our hunch is correct or incorrect regarding NBA players who have been scoring an average of high 3-pointers per game and are more likely to be in teams that make the NBA playoffs.
Let's look at top 50 individual NBA players statistics grouped in four different categories by Team, position, playoff, year, and sorted on average of 3 pointers per game
# look at heat map analysis on overall mean 3 Pointer Per Game for five-year historical data
nba_3Pointer =df.groupby(['Player','Team','Position','Playoff','Year']).agg({'3PointerPerGame':['mean']})
nba_3Pointer.columns = ['3PointerPerGame_Mean']
players_avg_3PointerPerGame = nba_3Pointer.sort_values(by=['3PointerPerGame_Mean'], ascending = False)[:50]
players_avg_3PointerPerGame.style.background_gradient(cmap = 'Blues')
Explaination of Heat Map Analysis: Top 50 NBA Player Statistics based on Average 3 Pointer Per Game
In the top 50 NBA player statistics 5 three pointer per game is the highest average points in the 3 pointer shooting category scored by Steph Curry and James Harden also they both were in teams that mades the playoffs, and have been consistent in their performance over the years based on the data.
But if we look at the broader picture in the heat map data analysis even though there are some players that appear twice in the data above because either they played for different teams or their consistently performing well in multliple position in different year scoring a high average of three pointer per games. But the main point is there are 23 out of 50 NBA player were not teams that made the playoffs which 46% of the data which concludes that are hunch is incorrect regarding players scoring an average of high 3-pointers per game and are more likely to be in teams that make the NBA playoffs. To further prove our assumptions on this heat map analysis we will narrow the data down to top 10 NBA players based on average 3 pointers per game
Top 10 NBA Player Statistics based on Average 3 Pointer Per Game
players_avg_3PointerPerGame.sort_values(by=['3PointerPerGame_Mean'], ascending = False)[:10]
Explaination of Heat Map Analysis: Top 10 NBA Player Statistics based on Average 3 Pointer Per Game
Now, that we have narrowed our data down to top 10 NBA players having a high average of 3 pointer per game we can see that some of the players have been consistently performing well in their individual statistics shooting like Steph Curry leading in points in the year 2018 and 2015, also being in teams that playoffs. Similarly, 'James Harden' have tied shooting average of 5 three pointers per game.
On the other hand, there are players for instance D'Angelo Russell who has been scoring consistently with 4 three pointer per game in the year '2019' for two different teams but did not make the playoff. In addition, the data reveals that 4 out of 10 players in the leading three points shooting category were not in the teams that made the playoffs which is 40% of the data above. It further proves that individual players statistics do not correlate to their chances of always making the playoffs or being in the teams that make the playoffs.
We should now analyze 2PointerPerGame shooting category with overall players statistics to see if the data reveals similar patterns in comparison three pointer shooting category. We will use 2PointerPerGame score for our analysis to find top 50 players who have a high average in the two pointer shooting category. Based on the following heat map data analysis it would help us reveal that either our hunch is correct or incorrect regarding NBA players who have been scoring an average of high 2-pointers per game and are more likely to be in teams that make the NBA playoffs.
Let's look at top 50 individual NBA players statistics grouped in four different categories by Team, position, playoff, year, and sorted on average of 2 pointers per game
# look at heat map analysis on overall mean 2 Pointer Per Game for five-year historical data
nba_2Pointer =df.groupby(['Player','Team','Position','Playoff','Year']).agg({'2PointerPerGame':['mean']})
nba_2Pointer.columns = ['2PointerPerGame_Mean']
players_avg_2PointerPerGame = nba_2Pointer.sort_values(by=['2PointerPerGame_Mean'], ascending = False)[:50]
players_avg_2PointerPerGame.style.background_gradient(cmap = 'RdYlGn')
Explaination of Heat Map Analysis: Top 50 NBA Player Statistics based on Average 2 Pointer Per Game
In the top 50 NBA player statistics 10 three pointer per game is the highest average points in the 2 pointer shooting category scored by Anthony Davis and Russell Westbrook also they both were in teams that mades the playoffs, and have been consistent in their performance over the years based on the data. But at the same time Anthony Davis has also not made in the playoffs while playing for the same team in different seasons. It shows that having good performance in the 2 pointer shooting category does not depend upon guarantee playoffs spot.
But if we look at the broader picture in the heat map data analysis even though there are some players like Anthony Davis, LeBron James,and Russell Westbrook that appear twice in the data above because either they played for different teams or their consistently performing in well in multiple position in different years scoring a high average of two pointer per games. But the main point is there are 18 out of 50 NBA player were not teams that made the playoffs which 36% of the data which concludes that are hunch is incorrect regarding players scoring an average of high 2-pointers per game and are more likely to be in teams that make the NBA playoffs. To further prove our assumptions on this heat map analysis we will narrow the data down to top 10 NBA players based on average 2 pointers per game
Top 10 NBA Player Statistics based on Average 2 Pointer Per Game
players_avg_2PointerPerGame.sort_values(by=['2PointerPerGame_Mean'], ascending = False)[:10]
Explaination of Heat Map Analysis: Top 10 NBA Player Statistics based on Average 2 Pointer Per Game
Now, that we have narrowed our data down to top 10 NBA players having a high average of 2 pointer per game we can see that some of the players have been consistently performing well in their individual statistics shooting like Anthony Davis leading in points in the year 2017 and 2016, also being in teams that playoffs. At the same time Anthony Davis have scored an average of 9 two per games in the year 2018 and 2015 but did not make playoffs with same team.
On the other hand, there are two players for instance Karl-Anthony Towns and DeMar DeRozan who has been scoring consistently with an average of 9 two pointer per game in the same year 2016-2017 are in two different team but only one of them made the playoffs. The data indicates that overall teams performance plays a vital roles in making playoffs then just individual players statistics.
In addition, the data reveals that 4 out of 10 players in the leading two points shooting category were not in the teams that made the playoffs which is 40% of the data above. It further proves that individual players statistics do not correlate to their chances of always making the playoffs or being in the teams that make the playoffs.
We should now analyze FieldGoalsPerGame shooting category with overall players statistics to see if the data reveals similar patterns in comparison two pointer shooting category. We will use FieldGoalsPerGame score for our analysis to find top 50 players who have a high average in the fields goals shooting category. Also, include both three and two pointer shots category statistics in this column based on position of an NBA player they shoot only 2 or 3 pointers. Based on the following heat map data analysis it would help us reveal that either our hunch is correct or incorrect regarding NBA players who have been scoring an average of high fields goals per game and are more likely to be in teams that make the NBA playoffs.
Let's look at top 50 individual NBA players statistics grouped in four different categories by Team, position, playoff, year, and sorted on average of field goals per game
# look at mean of overall
nba_fieldGoals =df.groupby(['Player','Team','Position','Playoff','Year']).agg({'FieldGoalsPerGame':['mean']})
nba_fieldGoals.columns = ['FieldGoalsPerGame_Mean']
players_avg_fieldGoalsPerGame = nba_fieldGoals.sort_values(by=['FieldGoalsPerGame_Mean'], ascending = False)[:50]
players_avg_fieldGoalsPerGame.style.background_gradient(cmap = 'coolwarm')
Explaination of Heat Map Analysis: Top 50 NBA Player Statistics based on Field Goals Per Game
In the top 50 NBA player statistics 11 field goals per game is the highest average points in the field goals shooting category scored by James Harden, Russell Westbrook as they both were in same team, and Giannis Antetokounmpo was in different team that made the playoffs, and have been consistent in their performance over the years based on the data.
But if we look at the broader picture in the heat map data analysis even though there are some players like LeBron James, Kyrie Irving, and Kevin Durant that appear more than twice in the data above because either they played for different teams or their consistently performing in well in multiple position in different years scoring a high average of fields goals per game. But the main point is there are 10 out of 50 NBA player were not on teams that made the playoffs which 20% of the data.
Since, field goals column includes both players shooting 2 and 3 pointer per games it concludes that are hunch is incorrect regarding players scoring an average of high 3-pointers or 2-pointers per game and are more likely to be in teams that make the NBA playoffs because according to the field goals per games analysis above shows that players should be able to score better in both shooting category. To further prove our assumptions on this heat map analysis we will narrow the data down to top 10 NBA players based on average field goals per game.
Top 10 NBA Player Statistics based on Average Field Goals Per Game
players_avg_fieldGoalsPerGame.sort_values(by=['FieldGoalsPerGame_Mean'], ascending = False)[:10]
Explaination of Heat Map Analysis: Top 10 NBA Player Statistics based on Average Field Goals Per Game
Now, that we have narrowed our data down to top 10 NBA players having a high average of field goals per game we can see that some of the players have been consistently performing well in their individual statistics shooting like Giannis Antetokounmpo leading in points in the year 2019 and 2018, also being in same teams that made playoffs. Similarly, James Harden and Russell Westbrookhave tied shooting average of 11 fields goals per game on same team.
On the other hand, there are players for instance Anthony Davis who has been scoring consistently with 10 field goals per game in the year '2016' and '2017' with same team but made the playoffs only once. In addition, the data reveals that 1 out of 10 NBA players in the above heat map analysis were not on teams that made the playoffs which is 10% of the data above. It further proves that individual players statistics in the fields goals per game category do correlate to their chances of always making the playoffs or being in the teams that make the playoffs.
We should now analyze TotalPointsPerGame shooting category with overall players statistics to see if the data reveals similar patterns in comparison field goals category. We will use TotalPointsPerGame score for our analysis to find top 50 players who have a high average in the total points per game category. Based on the following heat map data analysis it would help us reveal that either our hunch is correct or incorrect regarding NBA players who have been scoring an average of high total points per game and are more likely to be in teams that make the NBA playoffs.
Let's look at top 50 individual NBA players statistics grouped in four different categories by Team, position, playoff, year, and sorted on average of total points per game
# look at mean of overall 3 Pointers and the number of reviews
nba_totalpoints =df.groupby(['Player', 'Team','Playoff','Position','Year']).agg({'TotalPointsPerGame':['mean']})
nba_totalpoints.columns = ['TotalPointsPerGame_Mean']
players_avg_totalpointsPerGame = nba_totalpoints.sort_values(by=['TotalPointsPerGame_Mean'],ascending = False)[:50]
players_avg_totalpointsPerGame.style.background_gradient(cmap = 'CMRmap')
Explaination of Heat Map Analysis: Top 50 NBA Player Statistics based on Total Points Per Game
In the top 50 NBA player statistics 36 total points per game is the highest average in the total points category scored by James Harden, and Russell Westbrook scored an average of 32 total points per game as they both were in different teams that made the playoffs, and have been consistent in their performance over the years based on the data. But at the same time Bradley Beal amd Trae Young were also top 4 leading scorers in the total points category in 2019-2020 season but their teams did not make the playoffs. It shows that having good performance in the total points per game category does not guarantee playoffs spot.
But if we look at the broader picture in the heat map data analysis even though there are some players like LeBron James, Anthony Davis, Bradley Bealand Kevin Durant that appear more than twice in the data above because either they played for different teams or their consistently performing in well in multiple position in different years scoring a high average of total points per games. But the main point is there are 16 out of 50 NBA player were not in teams that made the playoffs which 32% of the data. It concludes that are hunch is correct regarding the total points per games analysis above shows that players should be able to score better in both shooting category. To further prove our assumptions on this heat map analysis we will narrow the data down to top 10 NBA players based on average total points per game
Top 10 NBA Player Statistics based on Average Total Points Goals Per Game
players_avg_totalpointsPerGame.sort_values(by=['TotalPointsPerGame_Mean'], ascending = False)[:10]
Explaination of Heat Map Analysis: Top 10 NBA Player Statistics based on Average Total Points Per Game
Now, that we have narrowed our data down to top 10 NBA players having a high average of total points per game we can see that some of the players have been consistently performing well in their individual statistics like James Harden leading in points in the year 2019 and 2018, also being in same teams that made playoffs. Similarly, Russell Westbrookhave scored an average of 32 total points per game being in different team that made playoffs.
On the other hand, there are players like Bradley Beal who have scored 31 total points per game and Trae Young have scored 30 total points per game in the same regular season 2019-2020 with different team but did not make the playoffs. In addition, the data reveals that 2 out of 10 players from the above data were not on teams that made the playoffs which is 20% of the data above. It further proves that individual players statistics in the total points per game category do correlate to their chances of always making the playoffs but it is also dependent on the overall team statistics.